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Abstract 

We propose an ^i-penalized algorithm for fitting high-dimensional generalized linear mixed mod- 
els. Generalized linear mixed models (GLMMs) can be viewed as an extension of generalized linear 
models for clustered observations. This Lasso-type approach for GLMMs should be mainly used 
as variable screening method to reduce the number of variables below the sample size. We then 
suggest a refitting by maximum likelihood based on the selected variables only. This is an effective 
correction to overcome problems stemming from the variable screening procedure which are more 
severe with GLMMs. We illustrate the performance of our algorithm on simulated as well as on real 
data examples. Supplemental materials are available online and the algorithm is implemented in the 
R package glmmlasso. 

Key Words: coordinate gradient descent; Laplace approximation; random-effects model; 
variable selection. 
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1 Introduction 

In recent years, high-dimensional linear regression mo dels have been ext ensively studied. The most 
popular method to achieve sparse estimates is the Lasso (jTibshiranil . 1 1996). which uses an £1 -penalty. 
The Lasso is not only attractive in terms of its statistical properties but also due to its fast computa- 
tion solving a convex optimization problem. However, relati vely few articles examin e hig h-dimensional 
regres sion problems involving a non-c onvex loss func t ion, i. e. Khalili and Chenl (2007) a nd Stadler et al 



( 2010l ) for Gaussian mixture mod e ls, Pan and Shen ( 2007 ) and Witten and Tibshirani ( 2010l) for clus- 
tering and Witten and Tibshirani ( 20 111) for linear discriminant analysis. 



G eneralized linear mixed models (McCullagh and Nelder . 1989; Brcs low and Clavtonlll993 ; McCulloch and Searle 



2001 : Molenberghs and Verbeke . 2005tl are an extension of generalized linear models by adding random 
effects to the linear predictor in order to accommodate for clustered or overdispered data. These models 
have been received much attention in many applications such as biology, ecology, medicine, pharmaceuti- 
cal science and econometrics. Available software packages (lme4 in R, NLMIXED in SAS, among others) 
allow to fit a wide range of generalized linear mixed models. 

In this paper we develop a method for high-dimensional generalized linear mixed models. It is based 
on a Lasso-type regularization with a cyclic coordinate descent optimization. Due to shrinkage intro- 
duced by ^-penalization, our approach performs in a first step variable screening, thereby selecting a 
set of candidate active variables. In other words, the proposed method primarily aims at reducing the 
dimensionality of the high-dimensional GLMM. In a second step, we perform refitting by maximum like- 
lihood estimation to ge t accurate parame ter estimates. The idea of such a two-stag e approach has been 
used in linear mod els (lEfron et all 2004 ) and it i s relat ed to the adaptive Lasso (jZoul . l200otl and the 



thresholded Lasso (|Zhoul . 120101 : Ivan de Geer et all 120111 ). In fact, a two-stage approach is much more 



important than for linear models since shrinkage in GLMMs can have a severe effect on the estimation 
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of variance components, see Sections S] and [5j 

To the best of our knowledge, there does not exist any literature devoted to truly high-dimensional 
generalized linear mixed models. Some papers focus on pen alized variab l e selection procedure s in gener- 
alized mixe d models with low-dim e nsional data: we refer to Yane ( 2007 ) , Ibrahim et al.l ( 20101 ) , Ni et al 



(|2010h and lGroll and Tutd (|201lh . ISchelldorfer et al.l (|201lh present statistical theory and an algorithm 
for high-dimensional Gaussian linear mixed models, where computation is much easier than in the gen- 
eralized case. The main contribution of the present paper is the construction and implementation of an 
efficient algorithm for ^-penalization in truly high-di mensional gen eralized linear mixed models, called 
the GLMMLasso. We use the L aplace approximation ( Bates . 2011bl) and combine it with efficient coordi- 
nate gradient descent methods ( Tseng and Yunl . 2009h . Our algorithm is feasible for problems where the 
number of variables is in the thousands, and taking advantage of sparsity with respect to dimensionality 
(i.e. only few active variables) is exploited by an active set strategy. 

The rest of the article is organised as follows. In Section [3J we review the generalized linear mixed 
model and introduce the GLMMLasso estimator. In Section [31 we describe the details of the computa- 
tional algorithm before advocating the two-stage GLMMLasso estimators in Section [U In Section [5] and 
[6] we consider the performance of our methods on simulated and real data sets. The article concludes 
with a discussion in Section [71 Supplemental materials including additional simulation examples are 
available online. 



2 Generalized linear mixed models and ^-penalized estimation 



In this section, we first look at the classical GLMM setting where the num ber of observations is larger 
than the number of covariates, i.e. p < n. We closely follow [Bates! ( 2011a ). Secondly, we consider the 
high-dimensional framework, i.e. n<p, and present the ^i-penalized maximum likelihood estimator. 



2.1 Model formulation 

Suppose that the observations are not independent but grouped instead. Let r = 1,. .., N be the 
grouping index and j = 1, . . . , n r the jth outcome within group r. Denote by n the total number of 
observations, i.e. n — ^ r=1 n r . Let X be the n x p fixed-effects design matrix, Z the n x q random- 
effects design matrix, y the n-dimensional random response vector and 3 be the g-dimensional vector 
of random-effects. We observe y of y whereas 3 is unobserved. The generalized linear mixed model is 
specified by the unconditional distribution of 3 and the conditional distribution of y\3 = b: 

i) y%\3 = b are independent for i = 1, . . . , n. 

ii) The distribution of yi\3 = b belongs to the exponential family with density 

expj^ 1 ^ -6(6)) +c(yi,<j>)}, 

where b(.) and c(., .) are known functions. (f> is the dispersion parameter (known or unknown) and 
£i is associated with the conditional mean fii := E[yi\3 = 6], i.e. £j = 

iii) The conditional mean vector fi depends on b through the known link function g and the linear 
predictor r) — X(3 + Zb, with 77 = g(fi) componentwise. Here, /3 is the unknown p-dimensional 
parameter vector, called fixed effects, and b the unknown g-dimensional vector of random effects. 

iv) 3 ~ J\f q (0, Se) where the covariance matrix E# is parametrized by the unknown parameter vector 
8 S M. d . We assume that So is positive semidefinite, i.e. X# > 0. d is typically small, say d < 10. 

By using 3 and Se in the definition above, we have already defined the random-effects structure of the 
GLMM. To be more precise, we have specified which variables have an additional random effect and how 
the structure of looks like (e.g. multiple of the identity or diagonal). A discussion of how to find 
these structures is beyond the scope of this paper. 

Let us write So in terms of its Cholesky decomposition = A#Aq and introduce the (unobserved) 
random variable U defined by 3 :— K-eU where U ~ Af q (0, l q ). Then the linear predictor rj can be 
written as r\ — X(3 + ZAgu. We estimate the parameters f3, 9 and <fi (if unknown) by the maximum 
likelihood method and predict the random-effects u. 
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2.2 Likelihood function 



Employing the notation Ci(A*i) — 0), the likelihood function of a GLMM is given by the following 
expression: 



L(J3, 0, 4>) = fl exp j^' 1 (Wi(/3, 0) - 6(&(/3, 0))) + c( W , 0) J 
= (^l eXP {g 



(2tt)«/ : 



■ exp 



{-ilNII} 



w6(/3,0)- 6(609, 0)) 



+ c(y*,<£) 



(1) 



In general, the integral (ffl) can not be w o rked out analytically and numeric al ap proximations are required, 
see lSkrondal and Rabe-Heskethl (|2004l ). iMolenberghs and Verbekel (|2005h and|jiangj (|2007l) 



2.3 The GLMMLasso estimator 

We now turn to the high-dimensional setting where the number of fixed-effects variables p is much 
larger than the number of observations n, i.e. we study the so-called n <C p setup. 

Let us assume that the true underlying fixed-effects vector (3q is sparse in the sense that many 
coefficients of 0o are zero. To enforce sparsity, we advocate a Lasso-type approach. This means that we 
add an ^-penalty for the fixed-effects vector (3 to the likelihood function. Thus, we are going to consider 
the following objective function: 



-21ogL(/3,0,< 



(2) 



where A > is a regularization parameter. Appropriate choices for A are discussed in Section 2] 

We aim at estimating the fixed-effects parameter (3, the covariance parameter 6, and if unknown the 
dispersion parameter cj>, by 

0,0,4>):=axgminQ x (f3,e,<p). (3) 
P,o,<t> 

We call ^ the GLMMLasso estimator. Since the likelihood function ([T]) comprises analytically in- 
tractable integrals (except for the Gaussian case), some approximations have to be used. We are going 
to illustrate the algorithm using t he Lap l ace app roximation. For GLMMs, it is accurate with low compu- 
tational burden, as advocated by Bates! ( 20 1 1 bf) . A thor ough discussion of the accuracy and limitations 
of the Laplace approximation can be found in Joe ( 20081) . Generally, the Laplace approximation is used 
to calculate integrals of the form 

e'^du, (4) 



1 = 



where S(u) is a known function of a g-dimensional variable u. Let 

u = argmax— S(u) 

u 

(i.e. S'(u) — 0), then the Laplace approximation of I is given by 

(2 7 r)«/ 2 |5"({ t )r 1/2 e- S(l 



I ^ I 



LA 



(5) 



(0) 



T he mode u in (O is calculated by the penalized iterative least squares (PIRLS) algorithm. It is presented 
in iBatesI (|2011rJ) and described in the supplemental materials. The PIRLS algorithm is related to the 
iterative reweighted least squares (IRLS) algorithm for obtaining the maximum likelihood estimator in 
generalized linear models. 

It should be noted that u depends on (3, 8 and 0. From ([T]) and (JSJ) we deduce that the Laplace 
approximation of the objective function Q\(.) in ((2]) is 




+ 1 

where Wp fi ,<t> = diag " 1 (dM jiijfl, 0))g' {^((3, 6)) 

' 98S 





c{ Vi ,ct>) + log|(ZA e ) T W /w (ZA e ) + (7) 



and v{.) is the known conditional variance func- 



tion ( McCullagh and Nelderl . Il989t ) . The estimator © is then approximated by 

LA 6 LA J LA ) :=argminQ^(/3,0». 



(8) 
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We call © the GLMMLasso^" 4 estimator. It is the approximation (JSJ) to the objective function © 
that is optimized to obtain the parameter estimates. Moreover, we would like to emphasize that © is 
a non-convex function with respect to (/3, 0, 4>) consisting of a non-convex loss function and a convex 
penalty. 



3 Computational algorithm 



In this section, we present the comput ational algorithm to ob tain the GLMMLasso iA estimator 
§8§ . The algorithm is based on ideas from iTseng and Yual (l2009h of the (block) coordinate gradient 
descent (CGD) method. The notion of the CGD algorithm is that we cycle through components of 
the full parameter vector ip := ((3, 6, 4>) £ 

with respect to one parameter while keeping the other parameters fixed, 
quadratic approximation and perform an in direct line search t o ensure that the objec t ive function de 
crease s. (Bl ock) CGD algorithm s are u sed in lMeier et al and Langel (|2008t ). iFriedman et al 



and minimize the objective function Q x 



LA (.) only 
In doing so we calculate a 



(|2010h and iBrehenv and Huand (|201lh and are now extremely popular in high-dimensional penalized 
regression problems. 

We first give an overview of the algorithm which exactly solves the minimization problem © before 
considering an approximate algorithm which finds a solution close to the exact minimization of ([5]). 
Finally, we present some details of the algorithm. 



3.1 The exact GLMMLasso algorithm 

We describe here an exact algorithm, called exact GLMMLasso (we notationally omit the involved 
Laplace approximation), for the Laplace approximated objective function in (|8]). Let us write ([7]) with a 
different notation to ease the presentation. For tp = ((3, 6, <fi) 6 R p+d+1 , define the function 

f{i,) := -2 g | vMM-b^M) + c(y .^| + lQg \ {ZAe f WAZAg) + lg \ + 

Now ([8]) can be written as *l>x A = argmin^, Q^ A (xf)) := f(?p) + X\\/3\\i. Let ej be the jth unit vector and 
( s ) the sth iteration step, and let 

be the estimates of f3, 6 and 4> in the sth iteration. Using 

p(s,s-i,p k ) := (^),...,4i,^, / 9f- 1 1 ),..., / 9(«-i)) r , 

e^- 1 ^ := (e['\. . . , oWaA+i^ ■ • ■ ' ^~ 1} ) T ' 

the exact GLMMLasso algorithm is summarized in Algorithm [T] 

Particularly in the high-dimensional setting, the calculation of the quadratic approximation requires 
a large amount of computing time. Therefore it is interesting to examine a much faster approximate 
algorithm. 



3.2 The (approximate) GLMMLasso algorithm 

In the exact Algorithm[T]above, we consider in step (1) b) the mode it as a function of the parameters, 
i.e. u = u(f3, 9, (f>). However, the calculation of the derivatives of /(.) with respect to j3k is computation- 
ally intensive. This becomes a major issue in the high-dimensional setting where a substantial amount of 
computing time is allocated to this particular part of the algorithm. In addition, the exact GLMMLasso 
algorithm requires a large number of outer iterations s. To attenuate these difficulties, we propose a 
slightly modified version of Algorithm [1] We suggest performing the quadratic approximation and the 
inexact line search while considering u as fixed and not depending on Denoting by f(.\u) the function 
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Algorithm 1 Exact GLMMLasso algorithm 



(0) Choose a starting value i/> (0) = (/3 (0) , # (0) , </> (0) )- 
Repeat for s = 1, 2, . . . 

(1) (Fixed-effects parameter optimization) 
For k = 1, . . . , p 

a) (Laplace approximation) 

Calculate the Laplace approximation 



Q LA^(s,s-l;k) i9 (s-l)^(s-l)\ 



b) (Quadratic approximation and inexact line search) 
i) Approximate the second derivative 

a 2 



by At > as described in the subsection below, 
ii) Calculate the descent direction d^ 5 ' € IR 



+ ld 2 h[ s) + XW^-"- 1 ^ +de fc ||i}. 
i) Choose a step size > and set /3<.°>*-l>l*+l) = p(a, a -l;k) + a <>) rfW ejfc suc h that 



(2) (Covariance parameter optimization) 
For I = 1, . . . , d 



«(») _ ar „ 



»! V 7 

(3) (Dispersion parameter optimization) 

_ argminQ^ A (/3( s ),6»( s ),^. 

until convergence. 

Algorithm 2 (Approximate) GLMMLasso algorithm 
Denote by it = u(/3( s > s - 1;,: ) , G^" 1 ) , </>( s_1 >) . Replace in Algorithm [T]i) - iii) by 
i') Approximate the second derivative 



j^(«,'-iA) |e (.-i)^(<-4j 



by At > as described in the subsection below, 
ii') Calculate the descent direction Ou £ R 



+ irf»hW+A||/3C'-- 1 i*)+de fc ||i}. 
iii') Choose a step size aj. s) > and set j3(«>«- 1 >>>+ 1 ) = jOC 8 ' 8-1 '*) + a^d^e k such that 



/(.) for which u is considered as fixed, the (approximate) GLMMLasso algorithm is given in Algorithm 

12 

We illustrate in the supplemental materials that the approximate GLMMLasso algorithm speeds 
up remarkably without loosing that much accuracy. Additionally, the approximation emphasizes the 
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importance of a refitting as advocated in the next section. 



3.3 Convergence behaviour and details of the GLMMLasso algorithm 



Numerical convergence. The convergence of the exact GLMMLas so algorithm to a stationary point 
can be proofed using the results presented in iTseng and Yun (|2009l) . It is worth pointing out that in 
the low-dimensional framework, the exact GLMMLasso algorithm with A = (no penalization) gives the 
same results as the function glmer in the R package lme4. 



(0) Starting value ip(°> . As starting value for /3, we fit a generalized linear model with the Lasso 
where the regularization parameter is chosen by cross-validation. The initial values for 8 and <f> are then 
calculated by the steps (2) and (3) in the Algorithms [Tj and [2J 



i) Choice ofh^ . For h ( k ' we choose the A;th diagonal element of the Fisher information of a generalized 
linear model. Hence we use the second derivative of the first summand in (J7J) . We set c m i n < h k < c max 



for positiv e constants c m ,; n and c 



converges ([Tseng and Yunl . 2009) 



(e.g. c„ 



10 5 and c„ 



10 ) in order that the algorithm 



ii ) Calculation of . The value d k s ^ is the minimizer of the quadratic approximation of the objective 
function Q^ A (.) and analytically given by 



4 S) - < 



, A - d/dp k Jp k -A - d/dPkfp k 

median | -r- s — , —pk, — 



M 



» 



ffc 



Pk penalized 
else. 



(9) 



where f Pk = f ((3 {s ' s ~ 1 *\ 8^~ l \ 
gorithm [3] 



in Algorithm [Hand f Pk = /(/3(*' a - 1 ; k ),0(*- 1 ),^ a - 1 )|€i) in Al- 



iii) Choice of a k . The step length a k is chosen such that the objective function Qx A {-) decreases. 
We suggest using the Armijo rule, which is defined for Algorithm [TJ as follows (and correspondingly for 
Algorithm [2] with fixed u): 



Armijo rule: Choose a k mt > and let ai be the largest element of {a™*<5'}i = o,i.2... satisfying 



QLAfp{.,.-X;k) 



<Q L x A (f3 {s ' s ~ l *\8^ 



where A k := d/d(3 k g Pk d { k s) + j(d { ^) 2 h ( k s) + X^-^ 1 ^ + rf£ s) e fc ||i - XW^'-^Wi. 

The choice of the constants comply with the suggestions in iBertsekasI (|l999l ). e.g. a k mt = 1, 5 = 0.5, 
p = 0.1 and 7 = 0. 



Active Set Algorithm. If we assume that the true fixed-effects parameter /3o is sparse in the sense 
that many elements are zero, we can reduc e the c omp uting time remarkably by using an active set al- 
gorithm. This is also used in Mei er et al. (l2008h and iFriedman et al. I (|2010h . In particular, we only 
cycle through all p coordinates every Dth iteration, otherwise only through the current active set 
S^O" 1 )) = {k : I3 { k ~ 1] ^ 0}. Typical values for D are 5 and 10. 



An implementation of the algorithm is given in the R package glmmlasso and will be made available 
on CRAN (http://CRAN.R-project.org/) and the first author's website. 



4 The two-stage GLMMLasso 7 ^ estimator (s) 



From the soft-thresholding property of the Lasso in linear models (jTibshirani . 1996 ) and in Gaussian 
linear mixed models ([Schelldorfer et all 120111 ). the fixed-effects estimate /3 is biased towards zero. In 
some generalized linear mixed models the estimate of the covariance parameter is biased, too. To 
mitigate these bias problems and the approximation error induced by using the approximate GLMMLasso 
algorithm, we advocate a two-stage procedure. The first step aims at estimating a candidate set of 
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predictors S and can be seen as a variable screening procedure. The purpose of the second step is a 
more unbiased estimation of the parameters using unpenalized maximum likelihood (ML) estimation 
based on the selected variables S from the first step. The proposed two-stage GLMMLasso algorithm is 
summarized in Algorithm [3J 



Algorithm 3 Two-stage GLMMLasso algorithm 

Stage 1: Compute the GLMMLasso ij4 estimate (JSJ and the set S. 
Stage 2: Perform unpenalized ML estimation. 



In the next subsections, we are going to discuss the specification of the set of variables S. We propose 
two m ethods from the high-dimensional linear regression framework, and we do not consider the adaptive 
Lasso (fZcol . l2006h . 



4.1 The GLMMLasso Lj4 -MLE hybrid estimator 

The L ARS-OLS hybrid es timator was examined in Efron et al. ( 20041) and also used in Meinshausenl 



(|2007j ) and lMeier et al.l (120081 ). In our context, it becomes a two-stage procedure where the model is refit- 
ted including only the covariates with a nonzero fixed-effects coefficient in 0i n it, where (0i n it, Omit, 4>init) 
denotes the initial estimate from ([5J. More specifically, choose S = Smit '■= {k ■ [fik.init ^ 0}. Then the 
GLMMLasso ij4 -MLE hybrid estimator is given by 



($, 9, (j)) hybrid ■= argmin -21ogZ(/3§. ,0,< 



(10) 



where for S C {1, . . . ,p}, (J3 s ) k = j3 k if k G S and ((3 s ) k = if k <£ S. 

4.2 The thresholded GLMMLasso LA estimator 

The thresholded Las so wi th refitting in high-dimensional linear regression models was examined in 
van de Geer et all (j201ll) and[Zhou| (|2010h . We define the set Sthres to be the set of variables which have 
initial fixed-effects coefficients larger than some threshold Xthres > 0, i.e. we choose S = Sthres '■— {k ■ 
\$k,init\ > ^thres}- Then the thresholded GLMMLasso 1 " 4 estimator is defined by 



(/3,0, , 



thr 



argmin -21ogL(/% ,6,, 



(11) 



The thresholded GLMMLasso Lj4 estimator involves another regularization parameter Xthres, which is 
determined by minimizing an information criterion presented in the next subsection. 



4.3 Selection of the regularization parameters 

The estimators flSJ), ([TUf and (fTTj) require the choice of the regularization parameters A and Xthres, 
respectively. We propose using the Bayesian Information Criterion (BIC) and the Akaike Information 
Criterion (AIC), defined by 

c„, A = -2 log L($, 6, $) +a{n)- df x (12) 

where a(n) = log(n) for the BIC and a(n) = 2 for the AIC. Here, df x = \{l < k < p : (3 k ^ 0}| + dim(0) 
is the sum of of the number of nonzero fixed-eff ects coefficients a nd the number of covariance parameters. 
The first summand is motivated by the work of Zou et al. ( 20071 ). Based on our empirical experience, we 
suggest for the estimators (JSJ and (ITU1) the BIC, whereas for (1X11) we advocate using the AIC (allowing 
for a larger number of variables) to select A first and then, sequentially, the BIC to select Xthres ■ We will 
compare the performance of the three estimators in the next sections. 
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5 Simulation Study 



In this section we assess the performance of the GLMMLasso ij4 estimators ©, (JTUJ) and (|TTj) . We 
compare them with appropr iate Lasso, maximum likelihood (ML) and Penalized Quasi-Likelihood (PQL, 
Breslow and Clavtonl (|1993l )) methods. 

In the main text, we only present simulation results for the high-dimensional logistic mixed model. 
Simulation studies for the low-dimensional logistic and the Poisson mixed model are included in the sup- 
plementary material. At the end of this section, we compare the GLMMLasso Lj4 estimates in a situation 
where the number of noise variables grows successively. 

First of all, let us summarize some general conclusions drawn from real data analysis and the simu- 
lation studies: 



a) 



b) 



The variable screening performance of the GLMMLasso algorithm is not only attractive for the high- 
dimensional setting, but also for low-dimensional data with a relatively large number of variables 
(say p > 20). 



The GLMM Lasso algorithm is numerically as stable as standard R func tions like glmer ( Bates . 201dt ) 
or glmmPQL dBreslow and Clavton , 1993 : Venables and Riplevl. 2002 ) when p < n. On the other 



hand, glmpath ( Park and HastieT 20071) and glmnet ( Friedman et al. . 2010h may fail to converge 



when high-dimensional models are misspecified. 

c) The main difference between the logistic and the Poisson mixed model is the shrinkage of the 
covariance parameter estimates of the GLMMLasso ij4 estimator. These estimates are severely 
biased in logistic mixed models, in contrast to the Poisson mixed model. Further differences 
between these two classes are summarized in the supplemental materials. 

d) The number of iterations s differs substantially between the classes of generalized linear mixed 
models and the data set. 



5.1 Preview for the logistic mixed model 

In this section we confine the discussion to the logistic mixed mode l because it is viewed as the most 
challenging model within the class of generalized linear mixed models ( Molenberghs and Verbekel 2005 : 



Jiang . 120071 ). As an overview, let us sum up the main findings from the simulation study in the logistic 



mixed model: 

i) The GLMMLasso Lj4 estimate from © of the covariance parameter 6 is notably biased. In other 
words, adding an £i-penalty does not only shrink the fixed effects estimate /3, but also the covariance 
parameter estimate 6. 

ii) The GLMMLasso ij4 -MLE hybrid estimator (fT0| performs better in terms of parameter estimation 
accuracy than the thresholded GLMMLasso 1 " 4 estimator (fTT|) . 

iii) The more random effects, the more important it is to use the GLMMLasso 1 " 4 for variable screening 
(instead of a Lasso ignoring the grouping structure). 

iv) The number of total iterations s needed is small, often about 15 iterations. 



5.2 High-dimensional logistic mixed model 

In all subsequent simulations schemes (including the supplemental materials), we restrict ourselves 
to the case where the number of observations per cluster is equal, i.e. n r — nc for r = 1, . . . , N . The 
covariates are generated from a multivariate normal distribution with mean zero and covariance matrix 
V with the pairwise correlation Vkk' — p [k ^ k ' and p = 0.2. Denote by /3o the true fixed effects (wherein 
(/3o)i is the intercept) and by sq the true number of nonzero fixed-effects coefficients. 

For the logistic mixed models, the intercept and the first covariate have independent random effects 
with different variance parameters. In particular, 9 = (#i, #2) and = diag(#?, , . . . , 9\, #§> • • • 5 #!) £ 
R 2N , i.e. q = 2N. We investigate the following two examples in the high-dimensional setting: 

Hi: N = 40, n c = 10, n = 400, p = 500, 9\ = 0§ = 1 and s = 5 with f3 = (0.1, 1, -1, 1, -1, 0, . . . , 0) T . 

H 2 : N = 50, n c = 10, n = 500, p= 1500,(9? =Q\ = \ and s = 5 with /3 = (0.1, 1, -1, 1, -1, 0, . . . , 0) T . 
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The fitted models are all correctly specified. Hereafter, we denote by oracle the ML estimate of the 
model which includes only the variables from the true active set. Let glmmlasso, hybrid glmmlasso 
and thres glmmlasso be the GLMMLasso 1 " 4 estimates ©, (| 10[) and (fTTj) . respectively. We compare 
the GLMMLasso Lj4 methods with the standard Lasso for generali zed linear models (whi ch ignore the 
grouping structure). For that purpose we use the glmpath algorithm ( Park and Hastiel . l2007t ) and the BIC 
as variable selection criterion. Then, let hybrid glmpath and thres glmpath be the two-stage procedures 
based on glmpath (without random effects). 

The results in the form of means and standard deviations (in parentheses) over 100 simulation runs 
are shown in Table [TJ Therein, IS^/S)! denotes the cardinality of the estimated active set and TP is the 
number of true positives (selected variables which are in the true active set). 



Table 1: Simulation results for the logistic mixed models H\ and H2 (standard deviations in parenthesis). 
t indicates that the median (mad) is shown due to one unreliable result. * means that the corresponding 
coefficient is not subject to penalization in the GLMMLasso LA estimate. 
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-1 


1 
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n 09 
u.yz 
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( n ftn\ 
(U.DU ) 


in 9Q'i 
[V.Zo ) 


in 1 o\ 
(0.19) 


in 1 v\ 
(U. It) 


in 1 v\ 
(U. It) 




glmmlasso 


6.17 


4.93 


0.41 


0.43 


0.67* 


-0.32 


0.26 


-0.34 






(1.28) 


(0.26) 


(0.26) 


(0.29) 


(0.17) 


(0.13) 


(0.13) 


(0.11) 




glmpath 


7.00 


4.75 






0.23 


-0.22 


0.22 


-0.28 






(2.37) 


(0.54) 






(0.13) 


(0.12) 


(0.11) 


(0.11) 




hybrid glmmlasso 


6.17 


4.93 


0.99 


1.03 


1.05 


-1.03 


0.99 


-1.01 






(1.28) 


(0.26) 


(0.63) 


(0.69) 


(0.24) 


(0.22) 


(0.29) 


(0.23) 




hybrid glmpath 


7.00 


4.75 


0.89 


0.94 


0.97 


-0.96 


0.96 


-0.99 






(2.37) 


(0.54) 


(0.56) 


(0.60) 


(0.36) 


(0.35) 


(0.32) 


(0.25) 




thres glmmlasso^ 


10.00 


5 


1.02 


1.11 


1.19 


-1.09 


1.11 


-1.13 






(3.71) 


(0) 


(0.70) 


(0.85) 


(0.29) 


(0.23) 


(0.20) 


(0.19) 




thres glmpath 


10.07 


4.99 


1.01 


1.04 


1.10 


-1.11 


1.10 


-1.10 






(3.77) 


(0.10) 


(0.65) 


(0.67) 


(0.29) 


(0.24) 


(0.19) 


(0.20) 


Hi 


oracle 


5 


5 


0.95 


1.02 


1.01 


-0.99 


1.03 


-1.02 










(0.49) 


(0.55) 


(0.22) 


(0.15) 


(0.18) 


(0.15) 




glmmlasso 


6.15 


4.97 


0.43 


0.45 


0.66* 


-0.31 


0.27 


-0.34 






(1.37) 


(0.17) 


(0.26) 


(0.29) 


(0.15) 


(0.10) 


(0.11) 


(0.10) 




glmpath 


7.00 


4.90 






0.22 


-0.20 


0.22 


-0.28 






(1.85) 


(0.33) 






(0.11) 


(0.08) 


(0.10) 


(0.09) 




hybrid glmmlasso 


6.15 


4.97 


0.99 


1.08 


1.04 


-1.01 


1.02 


-1.04 






(1.37) 


(0.17) 


(0.54) 


(0.59) 


(0.23) 


(0.16) 


(0.25) 


(0.18) 




hybrid glmpath 


7.00 


4.90 


0.93 


1.04 


0.99 


-0.97 


1.02 


-1.04 






(1.85) 


(0.33) 


(0.50) 


(0.55) 


(0.29) 


(0.25) 


(0.24) 


(0.17) 




thres glmmlasso^ 


14.00 


5 


1.30 


1.33 


1.26 


-1.16 


1.20 


-1.22 






(5.93) 


(0) 


(0.87) 


(0.79) 


(0.27) 


(0.28) 


(0.26) 


(0.24) 




thres glmpatht 


13.50 


5 


0.90 


1.03 


1.17 


-1.07 


1.13 


-1.15 






(5.19) 


(0) 


(0.52) 


(0.64) 


(0.25) 


(0.19) 


(0.22) 


(0.21) 



Comparing the cardinality of the active set, we see that thres glmmlasso and thres glmpath have 
much larger active sets than glmmlasso and glmpath, respectively. This is largely due to the fact that we 
employ the AIC in the first and the BIC in the second stage. This is outweighed by the advantage that 
the true effects are predominantly included in thres glmmlasso. This is in contrast to glmmlasso and 
glmpath, which may miss some true effects. The active set of glmmlasso is slightly smaller and shows 
less variability than that of glmpath. And yet the number of TP is better than for glmpath. Hence, we 
conclude that i) the existence of random effects does affect the variable selection performance of glmpath 
and ii) the thresholded procedures are favourable if we focus on true positives. 

Concerning covariance parameter estimation, we read off from the table that Q\ and Q\ are seriously 
biased for glmmlasso. This motivates the usage of a two-stage GLMMLasso iA procedure. The table 
suggests that hybrid glmmlasso is best in terms of covariance parameter estimation accuracy, slightly 
more accurate than hybrid glmpath and thres glmpath. The table shows that thres glmmlasso fails to 
achieve accurate parameter estimates. 

Looking at the fixed-effects parameter estimation accuracy, the simulation study reveals that the 
glmmlasso estimates are less biased than the corresponding glmpath estimates. And the same holds for 
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glmer p-glmer 




5 15 25 35 45 55 65 5 15 25 35 45 55 65 

number of predictors number of predictors 



glmmlasso hybrid glmmlasso 




5 15 25 35 45 55 65 5 15 25 35 45 55 

number of predictors number of predictors 



Figure 1: Minus twice out-of-sample log-likelihood for a growing number of covariates. The ML estimate 
performs badly whereas the GLMMLasso LA estimators remain stable, and they are comparable to the 
p-glmer in the low-dimensional framework. 



hybrid glmmlasso and hybrid glmpath. The fixed-effects parameter estimates of thres glmmlasso and thres 
glmpath perform inadequately. As marked by an asterisk in the table, P2 is not subject to pena lization 
for the GLMMLasso ij4 estimator since this variable has a random effect ( Schelldorfer et all 1 2 llh . Thus 



the bias of the estimate is much smaller than for the other fixed-effects coefficients. 

To sum up the simulation study, we first conclude that hybrid glmmlasso outperforms thres glmmlasso 
in terms of parameter estimation accuracy, but not in terms of the number of true positives. Second, 
glmmlasso procedures do outperform glmpath procedures as variable screening methods in the simulation 
settings, in all points under consideration (of course, glmpath is fitting a wrong model without random 
effects). 



5.3 Logistic mixed model with a growing number of noise covariates 

Here, we assess the performance of glmmlasso and hybrid glmmlasso when the number of noise 
variables grows successively. In the low-dimensional setting, we compare them with the ML estimate 
computed by the R function glmer and hence denoted by glmer. In addition, let p-glmer be the method 
which performs variable selection in the following way: Eliminate consecutively (backward selection) 
all variables with a p- value larger than 5% until the final model is attained comprising only significant 
variables. We compare these four methods in terms of minus twice of the out-of-sample log-likelihood. 
Let us fix the following random intercept model design: n = 400, N — 40, ric = 10, 6 2 = 1, /3q = 
(0, 1, —1, 1, —1). We start with p = 5 (no noise variables) and raise the number of variables to p = 65. 
The results over 50 simulation runs are depicted in Figure 1. 

The figures show that the negative out-of-sample log-likelihood values for glmer grow polynomial 
whereas the likelihoods for the other methods remain fairly constant. The increase in glmer stems from 
the fact that it overfits the model for a growing number of covariates. When focusing on the figures 
in more detail, we read off that the negative log-likelihood of glmmlasso increases slightly for larger p 
whereas the negative log-likelihood of hybrid glmmlasso remains stable. The rationale for this small 
increase in glmmlasso is that the more noise covariates, the larger the optimal A, and henceforth the 
larger the shrinkage of the fixed effects. And this leads to the increase of the out-of-sample log- likelihood. 
hybrid glmmlasso (and also thres glmmlasso) overcomes this problem and leads to a stable out-of-sample 
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likelihood irrespective of p. 



6 Illustration 

In this section we illustrate the proposed GLMMLasso 1 '" 4 estimators for Poisson regression on an 
extended real data set with count data. 



Data description. We consid er the epilepsy data from Thall and Vail ( 1990l ) which were also analyzed 



by iBreslow and Clavtonl (jl993l) . The data were obtained from a randomized clinical trial of 59 patients 
with epilepsy, comparing a new drug (Trt=l) with placebo (Trt=0). The response variable consists of 
counts of epileptic seizures during the two weeks before each of four clinic visits (V4=l for fourth visit, 
otherwise). Further covariates in the analysis are the logarithm of age (Age), the logarithm of 1/4 the 
number of baseline seizures (Base) and the interaction of Base and Trt (Base x Trt). The main ques- 
tion of interest is whether taking the new drug reduces the number of epileptic seizures compared with 
placebo. In order to assess the performance of the proposed procedure with high-dimensional data, we 
add U(— 1, 1) distributed noise predictors to get a data set with n — 236, N — 59, n r = 4 for r = 1, . . . , N 
and p = 4000. All predictors are standardized to have mean zero and standard deviation one. 



Model. Model III in IBreslow and Clavtonl (|l993l ) is a two level GLMM (jBatesl . l201dh . which is an 



extension of the single level GLMM introduced in Section 2 for more than one grouping variable. The 
model consists of two independent random intercept effects. One for subject (level 1, index r) and one 
for observation (level 2, index j). Let 2 sub and 9^ bs be the corresponding variance parameters. Then the 
linear predictor can be written as 

log(/x r j) = 7] rj = x^jP + 9 sub u r + d obs u rj r = 1, . . . , 59, j = 1, . . . , 4. 

Results. The results of the analysis are presented in Table [2l In the first column we show the 
estimates for Model III without performing variable selection. Therein, Intercept, Base and Trt are 
significant at the 5% level (indicated by If we perform backward selection using the BIC we end 
up with a model including Intercept and Base only. And this model coincides with the one selected by 
glmmlasso. Hybrid glmmlasso overcomes the bias problems of glmmlasso and it yields a better model 
in terms of the BIC. Thres glmmlasso includes additional noise variables, thereby achieving the smallest 
BIC score for all models under consideration. Comparing hybrid glmmlasso and thres glmmlasso, the 
table suggests that the additional covariates in the latter model reduce the variability while keeping the 
fixed-effects estimates unaltered. 



Table 2: Results for the epilepsy data. Model III is based on 6 fixed-effects covariates while the other 
methods are based on p — 4000 variables, including 3994 noise covariates. ' indicates that the corre- 
sponding coefficient is significant at the 5% level. * means that five noise variables are selected, but not 
shown in the table. S((3) — {k : [3^ ^ 0} is the total number of selected variables. 





Model III 


glmmlasso 


hybrid glmmlasso 


thres glmmlasso 


BIC 


527.3 


571.8 


515.5 


480.3 


S($) 


6 


2 


2 


7* 


Intercept 


1.58+ 


1.62 


1.58 


1.58 


Base 


0.66 1 " 


< 10" 4 


0.74 


0.75 


Trt 


-0.47 1 " 








Base x Trt 


0.36 








Age 


0.11 








V4 


-0.04 








sub 


0.21 


0.68 


0.25 


0.28 




0.13 


0.12 


0.13 


0.04 



7 Concluding Remarks 

We address the problem of estimating high-dimensio nal generaliz ed linear mixed models (GLMMs). 
While low-dimensional generalized linear mixed models ( Bates . 2010l ) and high-dimensional generalized 
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linear models (|van de Geerl . 120081 ) have been extensively studied in recent years, little attention has been 
devoted to high-dimensional GLMMs. We provide an efficient algorithm for the ^i-penalized maximum 
likelihood estimator, called GLMMLasso. It is based on the Laplace approximation, coordinatewise 
optimization and a speeding up approximation. The method should be typically used as a screening 
procedure to estimate a small set of important variables. We propose refitting by maximum likelihood 
to get accurate parameter estimates. The second stage is much more important than for linear models, 
because £i-shrinkage can lead to severe bias problems for the estimation of the variance components. 
Our work is primarily a contribution addressing the numerical challenges of performing high-dimensional 
variable selection and parameter estimation in nonlinear mixed-effects models involving a non-convex 
loss function. An implementation of the algorithm can be found in our R package glmmlasso. It will be 
made available on CRAN and the first author's website. 



SUPPLEMENTAL MATERIALS 

Appendices: Details of the PIRLS algorithm, the comparison of the exact and approximate GLMM- 
Lasso algorithm and additional simulation studies (see below). 

R-package for GLMMLasso : R-package glmmlasso containing code to perform the GLMMLasso 
algorithm (see http://stat.ethz.ch/people/schell). 
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Appendices to 



"GLMMLasso: An Algorithm for High-Dimensional 
Generalized Linear Mixed Models Using ^-Penalization" 

Jiirg Schelldorfer and Peter Biihlmann 



Appendix A: PIRLS algorithm 

In this section, we explain how to determine the mode u — argmax M —S(u) (introduced in Section 2 
of the article). We have to solve the following minimization problem: 



u = argmm 



S(u) :=-£ 



2/i&(u) - &(&(«)) 



+ c{yi,<f>) \ + h\u 



(13) 



We would like to highlight that S(u) is a convex function. We employ the Newton-Raphson algorithm 
to find a global minimum. From (|13[) we get 

S'(u) = -(ZA e ) T B{y- t i) + u , S"(u) = (ZA g ) T W(ZA g ) + 1, 
where W = di ag" 1 (<j>v(Hi)g'(fJ-i) 2 ^ , B = diag -1 ^<fiv(pi)g'(fii)J and v(.) is the conditional 



ance function (jMcCullagh and Nelder 
Algorithm 21 which is also described in 



1989) . Then follo wing the lines in lHastie et al. ( 20091) . we get 



Bates! (|2011al) and|BatesJ (|2011b| ) 



Algorithm 4 PIRLS algorithm 



Choose a starting value or set = 0. 
Repeat for r = 0, 1, 2, . . . 

r]^ =X/3 + ZA e u {r) 

M (r) = .<rV r) ) 

1 



W {r) = diag 

B (r) = diag | 1 

VM^(/4 r) )A =1 

z (r) = (ZAg)u^ + G (r) B (r) (2/ - M M ) 



Then solve 
until 



>o4 r Va«i p) ) a , 
i \ 



GW = diag(^(MfV(/zf) 2 )" =i 



(ZA e ) T VT (r) (ZA e ) + l„W r+1) = (ZA e ) T T¥ (r) z (r 



|T7( r+1 )-r7^|l 2 



< tol 



Set tt = 



The PIRLS algorithm converges fast. To further speed up Algorithm 1 and 2, we use the current 
value of u as starting value in step (1) a) of Algorithm 1. Consequently, the number of iterations required 
to update u is indeed small, often smaller than three. 
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Appendix B: Comparison of the exact and approximate GLMM- 
Lasso algorithm 

In this section, we compare the exact and the approximate algorithm (i.e. Algorithm 1 and 2) on 
various simulated data sets. We use the same model settings as in the simulation studies (see Section 5 
of the main article and Appendices C and D). 

First of all, let us give an overview of the key findings about the approximate version of the algorithm: 

1. The approximate algorithm is substantially faster than the exact algorithm (often more than 50%). 

2. For the logistic mixed model, the loss in accuracy (with respect to variable selection and parameter 
estimation) is very small. 

3. For the Poisson mixed model, the loss in accuracy stems from the selection of too many covariates 
with very small fixed-effects coefficients. This problem is effectively alleviated by the proposed 
two-stage procedures (see Section 4) . 

In detail, we compare the algorithms in terms of computing time, number of iterations, likelihood 
function, the active set and the fixed-effects estimation accuracy. Denote by x e the measure for the exact 
and x a the corresponding measure for the approximate GLMMLasso algorithm. Precisely, let rel.time = 
t a /t e be the relative (cpu) time, rel.iter = Iter" / Iter e be the relative number of outer iterations s, 
rel.ll = \£ a -£ e \/\£ e \ the relative difference of the likelihood function values, rel.fix — \\(3 a -/3 e ||2/||/3 e ||2 
be the relative difference of the fixed-effects parameters and activeSet the percentage of models where 
the active sets completely coincide for the exact and the approximate algorithm. For each model, we 
carry out 50 simulation runs. And for each run, we compare the results of the algorithm on a sequence 
of 21 A-values. The results in the form of means and standard deviations (in parentheses) are depicted 
in Table [3] (logistic mixed model) and Table @] (Poisson mixed model) . 

Table 3: Simulation results (mean values, standard deviations in parentheses) for logistic mixed models 
(Section 5 and Appendix C). 



Model 


rel.time 


rel.iter 


rel.ll 




rel.fix 


activeSet 


LI 


0.58 


0.63 


6 x 10" 


-4 


0.02 


0.98 




(0.18) 


(0.28) 


(3 x 10" 


-4) 


(0.01) 


(0.04) 


L2 


0.41 


0.61 


8 x 10" 


-4 


0.02 


0.87 




(0.09) 


(0.18) 


(5 x 10" 


-4) 


(0.01) 


(0.07) 


HI 


0.21 


0.67 


9 x 10" 


-4 


0.02 


0.83 




(0.07) 


(0.32) 


(7 x 10" 


-4) 


(0.01) 


(0.09) 


H2 


0.28 


0.77 


8 x 10" 


-4 


0.02 


0.84 




(0.14) 


(0.96) 


(6 x 10" 


- 4 ) 


(0.01) 


(0.08) 



We see for both the logistic and the Poisson mixed model that the approximate algorithm requires 
noteworthy less computing time and outer iterations. The gain in computing time is impressive and 
often more than 50%. It is apparent that the two procedures yield similar likelihood function values, 
although the Poisson mixed model has larger differences than the logistic mixed model. We read off 
from Table [3] that the parameter estimates are very similar and that the active sets coincide well. Table 
S] suggests that the active sets and the parameter estimates differ considerably more between the exact 
and the approximate algorithm. By a closer look, we do see that the differences originate in the fact that 
for some data sets the approximate algorithm selects more variables, but with very small fixed-effects 
coefficients. This explains the low values of activeSet. This problem can be effectively addressed by the 
two-stage procedures presented in Section 4 of the article. 

To sum up, the simulations do not only encourage the attractiveness of the approximate algorithm 
with respect to speed, but also the need for the two-stage procedures. 



Appendix C: Low-dimensional logistic mixed model 

In the low-dimensional setting, we compare our met hods with the unpenal i zed m aximum likelihood 
(ML) estimate and the Penalized Quasi-Likelihood fPQL. lBreslow and Clayton (1993)) estimate. Denote 
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Table 4: Simulation results (mean values, standard deviations in parentheses) for Poisson mixed models 
(Appendix D). 



Model 


rel.time 


rel.iter 


rel.ll 


rel.fix 


activeSet 


LI 


0.10 


0.10 


46 x 10~ 4 


0.12 


0.96 




(0.02) 


(0.02) 


(91 x 10~ 4 ) 


(0.07) 


(0.11) 


L2 


0.06 


0.11 


27 x 10~ 4 


0.13 


0.88 




(0.01) 


0.02 


(8 x 10- 4 ) 


(0.10) 


(0.10) 


HI 


0.11 


0.17 


524 x 10~ 4 


0.33 


0.30 




(0.02) 


(0.03) 


(697 x 10~ 4 ) 


(0.35) 


(0.26) 


H2 


0.09 


0.17 


698 x 10~ 4 


0.38 


0.31 




(0.02) 


0.04 


(951 x 10~ 4 ) 


(0.38) 


(0.29) 


H3 


0.19 


0.92 


1296 x 10~ 4 


0.10 


0.05 




(0.07) 


(0.39) 


(451 x 10" 4 ) 


(0.03) 


(0.11) 



them by glmer and glmmPQL, respectively. The comparison begs the question of how to perform variable 
selection for glmer and glmmPQL. We need some kind of variable selection procedure such that the results 
remain comparable with our methods. Hence we suggest to reduce iteratively (backward selection) the 
number of covariates by dropping those whose p- value is greater than 5%. By doing so, we end up with 
a model where all variables are significant. We denote these methods by p-glmer and p-glmmPQL. 
We present the following two examples in the low-dimensional setting: 

Lf. N = 40, n c = 10, n = 400, p = 10, 0? = 0| = 1 and s = 5 with (3 = (0.1, 1, -1, 1, -1,0,..., 0) T . 

L 2 : N = 40, n c = 10, n = 400, p = 50, 0f = 6\ = 1 and s = 5 with (3 = (0.1, 1, -1, 1, -1,0,..., 0) T . 

The results in the form of means and standard deviations (in parenthesis) over 100 simulation runs are 
depicted in Table [SJ Therein, \S(f3\ denotes the cardinality of the estimated active set and TP the 
number of true positives. 

To summarize the variable selection results, we see that the thresholded procedures have the smallest 
active sets. On the contrary, their number of true positives is slightly below 4. Table [5] suggests that the 
covariance parameter estimates of glmmlasso are considerable biased whereas the covariance parameter 
estimates of the other procedures are very similar. Concerning fixed-effects parameter estimation, the 
two-stage approaches perform better and do not show striking differences. Since /3i and /?2 are not subject 
to penalization (indicated by *), their bias is smaller compared with the pena lized coefficie nts. It can 
be seen from the table that the parameter estimates of p-glmmPQL are biased (j Jiang . 2007 ). The main 



conclusion is that thres glmmlasso performs worse than hybrid glmmlasso in the logistic mixed model. 

Appendix D: Simulation study for the Poisson mixed model 

We are going to present some simulations where the conditional response variable follows a Poisson 
distribution. It is interesting since the behaviour is different from the binary case. Let us look at 
two low-dimensional and three high-dimensional designs. Beforehand, let us sum up the most relevant 
findings. 

D.l Summary for the Poisson mixed model 

In this subsection, we are going to give an overview over th e properties of the Poisso n mixed model. 



We focus on the similarities and differences to the Gaussian (jSchelldorfer et all 120111 ) as well as the 
binary case (Section 5 and Appendix C). 

i) Shrinkage of the covariance parameters due to the ^i-penalization approach is not an issue. This 
is in contrast to the logistic mixed model and similar to the Gaussian case. 

ii) If we apply the Lasso ignoring the grouping structure within the observations, the recovery of the 
true active set fails. 



17 



Table 5: Simulation results for the logistic mixed models L\ and L 2 (standard deviations in parenthesis). * 
means that the corresponding coefficient is not subject to i\-penalization in the GLMMLasso LA estimate. 



Model 


Method 


\S(0)\ 


TP 


l 


6)2 

2 


/3i 


P'2 


ft 


Pi 


Pr> 


True 




5 


5 


i 


l 


0.1 


1 


-1 


1 


-1 


Li 


oracle 


5 


5 


1.02 


0.94 


0.10 


0.99 


-1.02 


1.00 


-1.02 






_ 


_ 


(0.54) 


(0.55) 


(0.21) 


(0.26) 


(0.20) 


(0.21) 


(0.18) 




glmmlasso 


6.32 


5 


0.70 


0.62 


0.09* 


0.83* 


-0.74 


0.70 


-0.75 






(1.32) 


(0) 


(0.38) 


(0.37) 


(0.18) 


(0.22) 


(0.18) 


(0.20) 


(0.18) 




glmpath 


6.29 


5 






0.08 


0.50 


-0.55 


0.54 


-0.58 






(0.97) 


(0) 






(0.15) 


(0.16) 


(0.16) 


(0.18) 


(0.15) 




hybrid glmmlasso 


6.32 


5 


1.05 


0.98 


0.11 


1.01 


-1.03 


1.02 


-1.03 






(1.32) 


(0) 


(0.57) 


(0.56) 


(0.21) 


(0.27) 


(0.20) 


(0.21) 


(0.19) 




hybrid glmpath 


6.29 


5 


1.02 


0.95 


0.11 


1.00 


-1.03 


1.01 


-1.03 






(0.97) 


(0) 


(0.54) 


(0.56) 


(0.21) 


(0.27) 


(0.20) 


(0.21) 


(0.19) 




thrcs glmmlasso 


5.10 


4.99 


1.02 


0.95 


0.10 


0.99 


-1.02 


1.01 


-1.03 






(0.33) 


(0.10) 


(0.54) 


(0.55) 


(0.21) 


(0.28) 


(0.20) 


(0.21) 


(0.19) 




thrcs glmpath 


5.07 


4.99 


1.02 


0.95 


0.10 


0.99 


-1.02 


1.00 


-1.02 






(0.29) 


(0.10) 


(0.54) 


(0.55) 


(0.21) 


(0.28) 


(0.20) 


(0.21) 


(0.19) 




p-glmer 


5.26 


5 


1.03 


0.96 


0.10 


1.00 


-1.03 


1.01 


-1.03 






(0.50) 


(0) 


(0.54) 


(0.56) 


(0.21) 


(0.26) 


(0.20) 


(0.21) 


(0.19) 




p-glmmPQL 


5.41 


5 


1.06 


0.97 


0.10 


0.91 


-0.97 


0.95 


-0.96 






(0.62) 


(0) 


(0.53) 


(0.51) 


(0.20) 


(0.24) 


(0.19) 


(0.20) 


(0.19) 


L 2 


oracle 


5 


5 


1.01 


0.86 


0.09 


1.00 


-1.00 


1.03 


-1.01 






_ 


_ 


(0.50) 


(0.48) 


(0.22) 


(0.25) 


(0.19) 


(0.20) 


(0.17) 




glmmlasso 


6.35 


5 


0.51 


0.40 


0.07* 


0.73* 


-0.47 


0.45 


-0.50 






(1.27) 


(0) 


(0.28) 


(0.26) 


(0.17) 


(0.18) 


(0.14) 


(0.14) 


(0.12) 




glmpath 


6.89 


4.99 






0.06 


0.37 


-0.36 


0.38 


-0.42 






(1.57) 


(0.10) 


_ 


_ 


(0.14) 


(0.14) 


(0.12) 


(0.12) 


(0.10) 




hybrid glmmlasso 


6.35 


5 


1.06 


0.91 


0.09 


1.03 


-1.01 


1.05 


-1.03 






(1.27) 


(0) 


(0.52) 


(0.53) 


(0.23) 


(0.25) 


(0.20) 


(0.21) 


(0.17) 




hybrid glmpath 


6.89 


4.99 


1.03 


0.89 


0.09 


1.03 


-1.01 


1.05 


-1.03 






(1.57) 


(0.10) 


(0.50) 


(0.50) 


(0.23) 


(0.26) 


(0.21) 


(0.23) 


(0.18) 




thrcs glmmlasso 


5.67 


5.00 


1.06 


0.90 


0.09 


1.03 


-1.02 


1.05 


-1.04 






(1.00) 


(0) 


(0.54) 


(0.52) 


(0.23) 


(0.25) 


(0.20) 


(0.21) 


(0.18) 




thres glmpath 


5.51 


4.99 


1.03 


0.89 


0.09 


1.01 


-1.01 


1.05 


-1.03 






(0.78) 


(0.10) 


(0.51) 


(0.50) 


(0.23) 


(0.27) 


(0.20) 


(0.20) 


(0.17) 




p-glmer 


5.90 


5.00 


1.07 


0.92 


0.09 


1.03 


-1.02 


1.06 


-1.04 






(1.64) 


(0) 


(0.53) 


(0.55) 


(0.22) 


(0.26) 


(0.20) 


(0.21) 


(0.17) 




p-glmmPQL 


5.67 


4.98 


1.05 


0.89 


0.08 


0.91 


-0.94 


0.96 


-0.94 






(1.75) 


(0.20) 


(0.48) 


(0.45) 


(0.20) 


(0.22) 


(0.19) 


(0.22) 


(0.19) 



iii) For Poisson mixed models, thres glmmlasso performs best whereas in logistic mixed models hybrid 
glmmlasso is preferable (see Appendix B). 

iv) For the Lasso, we carried out the R function glmpath for a comparison. However, in all our high- 
dimensional simulation settings, the function breaks down. Hence we employ the R function glmnet 
for comparisons. 

v) We observe a slow convergence rate (i.e. many outer iterations are required until convergence) in 
various real data applications. At the same time, convergence problems do occur in glmnet, too. 
The number of total iterations is often more than 100. 

D.2 Low-dimensional Setting 

For the Poisson mixed models simulation study, we look at random-intercept designs. This means 
that only the intercept has a random effect. Particularly, flel and Eg = d 2 l qi i.e. q = N. We present 
two examples in the low-dimensional setting. 

Li: N = 20, n c = 10, n = 200, p = 10, 6 2 = 1 and s = 5 with [3 = (^, §, -±, §, -±,0, . . . ,0) T . 

L 2 : N = 20, n c = 10, n = 200, p = 50, 6 2 = 1 and s = 5 with f3 = (^, |,-±, ±, -\, 0, . . . , 0) T . 
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Table 6: Simulation results for the Poisson mixed models L\ and L2 (mean values and standard deviations 
in parenthesis). * indicates that the corresponding coefficient is not subject to l\-penalization in the 
GLMMLasso LA estimate. 



Model 


Method 


|5(/3) 


TP 


P 


fa 


fa 


fa 


fa 


fa 


True 




5 


5 


I 


0.05 


0.5 


-0.5 


0.5 


-0.5 


Li 


oracle 


5 


5 


0.93 


0.11 


0.50 


-0.50 


0.50 


-0.50 






(-) 


(-) 


(0.34) 


(0.25) 


(0.05) 


(0.07) 


(0.06) 


(0.05) 




glrrimlasso 


6.84 


5 


0.92 


0.23* 


0.44 


-0.43 


0.42 


-0.43 






(1.35) 


(0) 


(0.34) 


(0.24) 


(0.06) 


(0.07) 


(0.06) 


(0.06) 




glmnet 


5.67 


4.63 




0.78 


0.28 


-0.24 


0.24 


-0.26 






(1.69) 


(0.97) 


(-) 


(0.32) 


(0.15) 


(0.17) 


(0.16) 


(0.14) 




hybrid glmmlasso 


6.84 


5 


0.93 


0.10 


0.50 


-0.50 


0.49 


-0.49 






Cl 35) 


<()) 


CO 34) 


(0.25) 


(0.05) 


(0.07) 


(0.06) 


(0.06) 




hybrid glmnet 


5.67 


4.63 


0.92 


0.13 


0.46 


-0.44 


0.45 


-0.45 






Cl 69) 


CO 97) 


CO 34) 


(0.26) 


(0.14) 


(0.18) 


(0.16) 


(0.15) 




thres glmmlasso 


5.08 


5 


0.93 


0.11 


0.50 


-0.50 


0.50 


-0.49 






CO SI 1 


CO) 


CO 34) 


(0.25) 


(0.05) 


(0.07) 


(0.06) 


(0.06) 




thres glmnet 


4.74 


4.63 


0.93 


0.13 


0.46 


-0.44 


0.45 


-0.45 






Cl 11) 


CO 97) 


(0.35) 


(0.26) 


(0.14) 


(0.18) 


(0.16) 


(0.15) 




p-glmer 


5.17 


5 


0.93 


0.10 


0.50 


-0.50 


0.50 


-0.50 






CO 43 ) 


(0) 


CO 34) 


(0.25) 


(0.05) 


(0.07) 


(0.06) 


(0.06) 




p-glmmPQL 


5.19 


5 


0.89 


0.14 


0.50 


-0.50 


0.50 


-0.50 






CO 46 "1 


CO) 


(0.32) 


(0.25) 


(0.05) 


(0.07) 


(0.06) 


(0.05) 


L> 


oracle 


5 


5 


0.96 


0.03 


0.51 


-0.50 


0.49 


-0.50 






(-) 


(-) 


(0.37) 


(0.25) 


(0.05) 


(0.06) 


(0.06) 


(0.06) 




glmmlasso 


9.37 


5 


0.93 


0.24* 


0.37 


-0.32 


0.32 


-0.35 






(2.68) 


(0) 


(0.37) 


(0.23) 


(0.06) 


(0.08) 


(0.07) 


(0.07) 




glmnet 


6.53 


3.90 




0.76 


0.20 


-0.13 


0.14 


-0.18 






(4.29) 


(1.53) 


(-) 


(0.32) 


(0.13) 


(0.12) 


(0.14) 


(0.14) 




hybrid glmmlasso 


9.37 


5 


0.93 


0.03 


0.49 


-0.47 


0.47 


-0.48 






(2.68) 


(0) 


(0.37) 


(0.25) 


(0.06) 


(0.07) 


(0.06) 


(0.07) 




hybrid glmnet 


6.53 


3.90 


0.90 


0.12 


0.39 


-0.33 


0.32 


-0.35 






(4.29) 


(1.53) 


(0.38) 


(0.28) 


(0.22) 


(0.24) 


(0.22) 


(0.22) 




thres glmmlasso 


5.65 


5 


0.95 


0.03 


0.51 


-0.49 


0.49 


-0.49 






(0.88) 


(0) 


(0.37) 


(0.25) 


(0.05) 


(0.06) 


(0.06) 


(0.06) 




thres glmnet 


4.33 


3.90 


0.93 


0.12 


0.39 


-0.33 


0.33 


-0.36 






(1.88) 


(1.53) 


(0.39) 


(0.28) 


(0.22) 


(0.24) 


(0.22) 


(0.22) 




p-glmer 


5.32 


5 


0.96 


0.03 


0.51 


-0.50 


0.49 


-0.50 






(0.62) 


(0) 


(0.38) 


(0.25) 


(0.05) 


(0.06) 


(0.05) 


(0.06) 




p-glmmPQL 


5.32 


5 


0.92 


0.07 


0.51 


-0.50 


0.49 


-0.50 






(0.69) 


(0) 


(0.35) 


(0.24) 


(0.05) 


(0.06) 


(0.06) 


(0.07) 



The results over 100 simulation runs are shown in Table HJ] 

We read off from the table that the thresholded methods pick less variables than the hybrid proce- 
dures. The table suggests that recovering the true active set is accurate for the mixed models approaches, 
but fails for the ordinary Lasso ignoring the grouping structure. The glmmlasso covariance parameter 
estimates are almost not biased, hybrid glmnet and thres glmnet do not result in accurate covariance 
parameter estimates. Furthermore, the table illustrates that the fixed-effects estimates are very accurate 
for hybrid glmmlasso and thres glmmlasso, but remarkably biased for hybrid glmnet and thres glmnet. 
Overall, we conclude from this simulation study that if we ignore the grouping structure in the screening 
procedure, we fail to recover the true active variables. As a consequence, the parameter estimates of the 
corresponding two-stage procedures are inaccurate. 

D.3 High-dimensional Setting 

We examine the following high-dimensional examples: 
Hf. N = 40, n c = 10, n = 400, p = 500, 6 2 = 1 and s = 5 with f3 = (55,5,-5, 5 , -5, 0, . . . , 0) T . 
Hr- N = 40, n c = 10, n = 400, p = 1000, 6 2 = 1 and s = 5 with (3 = (±, ±, -±, ±, -±, 0, . . . , 0) T . 
H 3 : N = 30, n c = 10, n = 300, p = 500, 6 2 = 0.25 and s = 5 with [3 = (2, |, -±, ±, -±, 0, . . . , 0) T . 
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The results in the form of means and standard deviations (in parentheses) over 100 simulation runs are 
shown in Table [7] 

Table 7: Simulation results for the Poisson mixed models Hi, H2 and H3 (mean values and standard 
deviations in parenthesis). * indicates that the corresponding coefficient is not subject to i\-penalization 
in the GLMMLasso LA estimate. 



Model 


Method 


\S($)\ 


TP 


i) 2 


01 


02 


03 


04 


05 


True 




5 


5 


1 


0.05 


0.5 


-0.5 


0.5 


-0.5 


Hi 


oracle 


5 


5 


0.95 


0.06 


0.49 


-0.49 


0.50 


-0.50 






(-) 


(-) 


(0.27) 


(0.18) 


(0.04) 


(0.04) 


(0.04) 


(0.04) 




glmmlasso 


11.25 


5 


0.95 


0.28* 


0.33 


-0.30 


0.31 


-0.33 






(4.26) 


(0) 


(0.27) 


(0.17) 


(0.04) 


(0.05) 


(0.06) 


(0.05) 




glmnet 


7.63 


3.09 


- 


0.79 


0.16 


-0.13 


0.13 


-0.16 






(8.59) 


(1.43) 


(") 


(0.19) 


(0.11) 


(0.10) 


(0.12) 


(0.11) 




hybrid glmmlasso 


11.25 


5 


0.92 


0.07 


0.47 


-0.46 


0.47 


-0.47 






(4.26) 


(0) 


(0.27) 


(0.18) 


(0.04) 


(0.05) 


(0.05) 


(0.04) 




hybrid glmnet 


7.63 


3.09 


0.92 


0.08 


0.37 


-0.34 


0.36 


-0.40 






(8.59) 


(1.43) 


(0.26) 


(0.18) 


(0.19) 


(0.23) 


(0.21) 


(0.18) 




thres glmmlasso 


6.52 


5 


0.94 


0.06 


0.49 


-0.48 


0.49 


-0.49 






(1.86) 


(0) 


(0.27) 


(0.18) 


(0.04) 


(0.05) 


(0.05) 


(0.04) 




thres glmnet 


3.79 


3.09 


0.95 


0.07 


0.38 


-0.35 


0.37 


-0.41 






(2.30) 


( 1 AX\ 


(0.25) 


(0.18) 


(0.20) 


(0.23) 


(0.22) 


(0.19) 


II , 

-"2 


oracle 





u 


u.yo 


-U.Ul 


U.Ol 


-u.ou 


u.ou 


-u.ou 






(-) 


(") 


(0.27) 


(0.16) 


(0.04) 


(0.04) 


(0.04) 


(0.04) 




glmmlasso 


11.62 


5 


0.95 


0.24* 


0.32 


-0.28 


0.27 


-0.31 






(4.15) 


(0) 


(0.26) 


(0.15) 


(0.05) 


(0.06) 


(0.06) 


(0.05) 




glmnet 


9.86 


3.91 


- 


0.77 


0.16 


-0.10 


0.09 


-0.14 






(10.55) 


(1.54) 


(") 


(0.19) 


(0.12) 


(0.10) 


(0.09) 


(0.12) 




hybrid glmmlasso 


11.62 


5 


0.92 


0.01 


0.47 


-0.46 


0.46 


-0.47 






(4.15) 


(0) 


(0.26) 


(0.15) 


(0.05) 


(0.05) 


(0.05) 


(0.04) 




hybrid glmnet 


9.86 


3.91 


0.90 


0.09 


0.38 


-0.30 


0.31 


-0.36 






(10.55) 


(1.54) 


(0.29) 


(0.18) 


(0.19) 


(0.23) 


(0.22) 


(0.20) 




thres glmmlasso 


6.80 


5 


0.94 


-0.01 


0.49 


-0.49 


0.48 


-0.49 






(1.90) 


(0) 


(0.26) 


(0.16) 


(0.05) 


(0.05) 


(0.05) 


(0.04) 




thres glmnet 


4.85 


3.91 


0.94 


0.09 


0.39 


-0.31 


0.32 


-0.37 






(3.13) 


(1.54) 


(0.27) 


(0.18) 


(0.20) 


(0.24) 


(0.23) 


(0.21) 




True 


5 


5 


0.25 


2 


0.5 


-0.5 


0.5 


-0.5 


H 3 


oracle 


5 


5 


0.25 


1.99 


0.50 


-0.50 


0.50 


0.50 






(-) 


(-) 


(0.07) 


(0.11) 


(0.02) 


(0.02) 


(0.02) 


(0.02) 




glmmlasso 


10.59 


5 


0.25 


2.10* 


0.41 


-0.40 


0.39 


-0.41 






(2.91) 


(0) 


(0.07) 


(0.11) 


(0.02) 


(0.03) 


(0.03) 


(0.03) 




glmnet 


11.68 


5 




2.32 


0.33 


-0.30 


0.29 


-0.32 






(7.63) 


(0) 


(-) 


(0.12) 


(0.06) 


(0.06) 


(0.06) 


(0.06) 




hybrid glmmlasso 


10.59 


5 


0.25 


1.99 


0.49 


-0.48 


0.48 


-0.48 






(2.91) 


(0) 


(0.07) 


(0.11) 


(0.02) 


(0.02) 


(0.02) 


(0.02) 




hybrid glmnet 


11.68 


5 


0.24 


1.99 


0.49 


-0.49 


0.49 


-0.49 






(7.63) 


(0) 


(0.07) 


(0.11) 


(0.02) 


(0.02) 


(0.02) 


(0.03) 




thres glmmlasso 


5.42 


5 


0.25 


1.99 


0.50 


-0.50 


0.50 


-0.50 






(0.74) 


(0) 


(0.07) 


(0.11) 


(0.02) 


(0.02) 


(0.02) 


(0.02) 




thres glmnet 


5.09 


5 


0.25 


1.99 


0.50 


-0.50 


0.50 


-0.50 






(0.40) 


(0) 


(0.07) 


(0.11) 


(0.02) 


(0.02) 


(0.02) 


(0.02) 



In terms of the active set and the true positives, we see that glmnet performs worse than glmmlasso. 
The table reveals that the ratio of the covariance parameter to the fixed-effects coefficients is an important 
issue for the recovery of the active set. The larger # 2 , the worse glmnet performs. Considering the 
parameter estimation accuracy, the Poisson mixed model shows that the variable screening using glmnet 
fails. This is far more obvious than in the logistic mixed model. We conclude by noting that the 
Poisson mixed model clearly shows that it is of paramount importance to perform variable screening 
using glmmlasso and that it can not be carried out by just applying an standard Lasso procedure (and 
thereby ignoring the grouping structure). 



20 



